Skip to content

ci: add Python version matrix, macOS smoke lane, and path-filter fixes#1247

Merged
kovtcharov-amd merged 5 commits into
mainfrom
feat/ci-matrix-881
Jun 1, 2026
Merged

ci: add Python version matrix, macOS smoke lane, and path-filter fixes#1247
kovtcharov-amd merged 5 commits into
mainfrom
feat/ci-matrix-881

Conversation

@kovtcharov-amd
Copy link
Copy Markdown
Collaborator

setup.py claims python_requires>=3.10 but CI only tested 3.12 — compatibility with 3.10/3.11 was never validated. Unit tests now run against all three supported versions via a strategy matrix, and a continue-on-error macOS smoke job catches Darwin-specific failures before users hit them. MCP tests missed cli.py changes that affect MCP subcommands, and the Lemonade smoke test fired on every PR regardless of what changed.

Test plan

  • Verify Unit Tests (py3.10), Unit Tests (py3.11), Unit Tests (py3.12) appear as separate CI jobs on this PR
  • Verify Unit Tests (macOS smoke) appears and runs (green or yellow, not red — continue-on-error: true)
  • Confirm MCP tests trigger when src/gaia/cli.py is in the changeset
  • Confirm Lemonade smoke test does NOT trigger on this PR (no src/gaia/llm/ changes)

Closes #881

@github-actions github-actions Bot added the devops DevOps/infrastructure changes label May 29, 2026
@github-actions
Copy link
Copy Markdown
Contributor

Clean, targeted CI fix with no blocking issues — approving. Three small nits worth addressing, one of which is a genuine coverage gap.

Summary

This PR patches three real CI problems in one shot: the Lemonade smoke test was burning expensive STX hardware time on every PR regardless of what changed; MCP tests didn't trigger when cli.py (which registers all gaia mcp subcommands) was edited; and setup.py claims python_requires>=3.10 but CI only validated 3.12. All three fixes are correctly scoped. The macOS smoke lane is a useful addition and the continue-on-error: true flag is appropriate for an explicitly experimental job.

Issues Found

🟢 macOS job skips packaging validation (test_unit.yml:164)

The Linux unit-tests job runs pytest tests/unit/test_packaging.py as a dedicated gate before the main suite. The macOS smoke job jumps straight to python -m pytest tests/unit/ -x and skips it entirely. Since packaging validation checks entry-point wiring and __init__.py completeness — both of which are platform-independent — there's no reason to omit it. Any macOS-specific import issue that causes the packaging test to fail (e.g., a platform-conditional import inside a __init__.py) would be silently missed.

      - name: Validate packaging integrity
        run: python -m pytest tests/unit/test_packaging.py -v --tb=short

      - name: Run unit tests (macOS smoke)

🟢 python -m pytest vs bare pytest style inconsistency (test_unit.yml:177)

Every other step in this repo's CI invokes pytest directly after installing into the system environment. The macOS job uses python -m pytest. Neither is wrong, but the inconsistency could mask a misconfigured PATH on macOS runners where the installed pytest shim isn't on $PATH. Either standardise on pytest (matching the rest of the file) or use python -m pytest everywhere — the latter is the safer choice since it guarantees the correct interpreter's pytest is invoked.

🟢 pytest-timeout installed on macOS but not Linux (test_unit.yml:166)

The macOS job installs pytest-timeout (needed for --timeout=60) while the Linux matrix doesn't, so a test that hangs without a timeout on Linux won't be caught until it surfaces on macOS. This isn't a blocker, but it means the two suites are diverging in a way that will make failures harder to diagnose. Consider adding --timeout=60 to the Linux unit-test run as well, or documenting the intentional divergence in a comment.

Strengths

  • Path-filter coverage on test_lemonade_server.yml is complete. The filter covers src/gaia/llm/**, src/gaia/installer/**, both custom actions, and the workflow file itself — nothing relevant is missing, and the expensive STX runner stays idle on doc/agent/UI-only PRs.
  • fail-fast: false on the Python matrix is the right call. Lets all three versions run to completion so a 3.10-only failure isn't hidden by an earlier 3.11 failure.
  • continue-on-error: true on the macOS job is correctly justified in both the inline comment and PR description — it follows the established GAIA pattern for experimental CI lanes that shouldn't block until they're fully green.

Verdict

Approve with suggestions. The three findings above are all minor. The packaging-validation gap (test_unit.yml:164) is the most meaningful one and is trivially fixed with the suggestion block above. The core improvements in this PR — path filtering on Lemonade, cli.py in MCP triggers, Python version matrix — are correct and ship real value.

@github-actions
Copy link
Copy Markdown
Contributor

Solid CI hygiene across the board — the path filters, version matrix, and macOS smoke lane all address real gaps that setup.py's python_requires>=3.10 declaration had left unvalidated. Two minor observations worth fixing, one informational gap worth noting.


Issues

🟢 Minor — Lemonade path filters don't cover the Lemonade test files (test_lemonade_server.yml)

The new path list covers source and infra, but not the three root-level test files that exercise Lemonade:

  • tests/test_lemonade_client.py
  • tests/test_lemonade_embeddings.py
  • tests/test_lemonade_health.py

If any of those are modified in isolation, the Lemonade smoke test won't trigger — the same gap this PR fixes for MCP (tests/mcp/**). Add a parallel entry to both push and pull_request paths:

      - 'src/gaia/llm/**'
      - 'src/gaia/installer/**'
      - 'setup.py'
      - '.github/workflows/test_lemonade_server.yml'
      - '.github/actions/install-lemonade/**'
      - '.github/actions/setup-venv/**'
      - 'installer/**'
      - 'tests/test_lemonade*.py'

(Same addition needed in the pull_request block.)


🟢 Minor — pytest-timeout missing from Linux matrix job (test_unit.yml:69)

The macOS smoke job correctly installs pytest-timeout and passes --timeout=60, but the Linux matrix job skips the package. Any test decorated with @pytest.mark.timeout(...) (or the --timeout default in a pytest.ini) silently has no enforcement on Linux. Adding it keeps the two job definitions consistent:

          uv pip install --system pytest pytest-cov pytest-asyncio pytest-mock pytest-timeout pyfakefs \
                                  keyring httpx respx

Informational — continue-on-error: true is at the job level, not the step level (test_unit.yml:152)

CLAUDE.md documents step-level continue-on-error for permission-warning tolerance; this PR uses it at the job level for a different purpose (experimental Darwin coverage). The intent is clearly documented in the inline comment ("continue-on-error until the full suite is macOS-clean"), so this is not a blocker — but the author should be aware that a broken macOS job produces no PR status signal until the flag is removed. Tracking that graduation in issue #881 (or a follow-up) would help avoid the flag living indefinitely.


Strengths

  • fail-fast: false on the Python matrix is the right call — a 3.10 failure shouldn't mask whether 3.12 also breaks.
  • Adding src/gaia/cli.py to the MCP trigger is a precise, correct fix: cli.py owns the MCP subcommand dispatch, so a change there that breaks gaia mcp previously flew under the radar.
  • merge_group and workflow_dispatch correctly carry no path restriction — always-run semantics for the merge queue are preserved.

Verdict

Approve with suggestions. The two minor issues are independent and both have one-line fixes; the path-filter gap for Lemonade test files is the one worth landing before this merges.

@kovtcharov-amd
Copy link
Copy Markdown
Collaborator Author

The Unit Tests (py3.10) failure is a real bug this PR intentionally exposed: test_symlink_to_blocked_directory_is_blocked fails on Python 3.10 because Path.resolve() handles symlinks differently. PR #1256 fixes the underlying is_write_blocked function to work correctly on Python 3.10+ — once it merges and this PR rebases, py3.10 will pass.

@kovtcharov-amd kovtcharov-amd added the p2 low priority label May 29, 2026
@kovtcharov-amd kovtcharov-amd self-assigned this May 29, 2026
itomek
itomek previously approved these changes May 29, 2026
Copy link
Copy Markdown
Collaborator

@itomek itomek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The path-filtering, version matrix, and macOS lane all address real gaps. One merge-ordering point worth calling out (not a code change here): this PR's premise is validating 3.10/3.11, but the py3.10 leg is currently red and you've noted it only goes green after #1256 lands and this rebases. Since that matrix leg has no continue-on-error (unlike the macOS lane), merging this before #1256 turns py3.10 into a permanently-failing required check — recommend sequencing #1256 first, then rebase and confirm green. The Lemonade path-filter gap is noted inline. pytest-timeout is enforced on macOS but not the Linux matrix — minor consistency nit. Approving on the understanding that the merge order is handled.


Generated by Claude Code

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The new path filter covers src/gaia/llm, installer, and actions, but not the root-level Lemonade test files. A PR editing only tests/test_lemonade_client.py (or _embeddings/_health) won't trigger this smoke test — the same gap you just fixed for MCP by adding src/gaia/cli.py. Add tests/test_lemonade*.py to both the push and pull_request paths blocks.


Generated by Claude Code

@itomek itomek enabled auto-merge May 29, 2026 13:47
@itomek itomek disabled auto-merge May 29, 2026 13:47
@github-actions github-actions Bot added tests Test changes security Security-sensitive changes labels May 29, 2026
itomek-amd
itomek-amd previously approved these changes May 29, 2026
Ovtcharov added 2 commits May 29, 2026 09:33
setup.py claims python_requires>=3.10 but CI only tested 3.12. Unit tests
now run against 3.10/3.11/3.12 via a strategy matrix. A macOS smoke job
(continue-on-error) catches platform-specific failures early. test_mcp.yml
gains cli.py in its path filter so MCP subcommand changes trigger the suite,
and test_lemonade_server.yml gets a path filter scoped to src/gaia/llm/
instead of firing on every PR.

Closes #881
Add packaging validation step to the macOS smoke job matching the Linux
job. Switch from `python -m pytest` to bare `pytest` for consistency
with the rest of the workflow.
Lemonade path filters missed tests/test_lemonade*.py, so edits to those
files wouldn't trigger the smoke test. Linux matrix was also missing
pytest-timeout, creating a divergence with the macOS job.
pull Bot pushed a commit to bhardwajRahul/gaia that referenced this pull request May 29, 2026
…#1256)

Symlinks pointing into blocked directories (`.ssh`, `C:\Windows`,
`/etc`, …) were not detected by `is_write_blocked()` on Python <3.12 —
the write guardrail returned `False` when it should have returned
`True`. The Python 3.10/3.11 version matrix (amd#1247) surfaced this via
`test_symlink_to_blocked_directory_is_blocked`. On 3.12+ the test passed
by accident because `Path.resolve()` internally delegates to
`os.path.realpath`; on older versions it uses a separate resolver that
can disagree after symlink traversal.

The fix drops the redundant `.resolve()` call so `os.path.realpath` is
the single source of truth for symlink resolution across all supported
Python versions.

## Test plan
- [ ] `python -m pytest tests/unit/test_security_edge_cases.py
tests/unit/test_file_write_guardrails.py -xvs` — all 132 pass, 6 skipped
(platform-specific)
- [ ] CI matrix passes on Python 3.10, 3.11, 3.12, 3.13

Co-authored-by: Ovtcharov <kovtchar@amd.com>
@kovtcharov-amd
Copy link
Copy Markdown
Collaborator Author

Addressed @itomek's review: \ is in both \ and \ path blocks (lines 18, 30). Also merged head of main and all other review comments were already addressed (packaging validation on macOS, pytest-timeout on Linux, symlink fix cherry-picked).

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Jun 1, 2026

Three surgical CI fixes in one well-scoped PR — the Python matrix, macOS smoke lane, and path filter corrections are all sound. One minor omission worth fixing before merge.


Issues Found

🟢 Minor — unit-tests-macos is missing timeout-minutes (.github/workflows/test_unit.yml:159)

The Linux unit-tests job caps at 30 minutes (timeout-minutes: 30, line 44). The macOS job has no cap, so a hanging test could hold a macOS runner for up to 6 hours — GitHub's hard default. The job already passes --timeout=60 to pytest (per-test timeout), but a deadlock at the fixture or import level would bypass that.

  unit-tests-macos:
    name: Unit Tests (macOS smoke)
    runs-on: macos-latest
    timeout-minutes: 30
    if: github.event_name != 'pull_request' || github.event.pull_request.draft == false || contains(github.event.pull_request.labels.*.name, 'ready_for_ci')
    continue-on-error: true

Strengths

  • Path filters are comprehensive. test_lemonade_server.yml covers source, installer, shared actions, workflow file itself, and root-level test globs — not just src/gaia/llm/**. This is the level of detail that prevents false negatives on indirect changes.
  • fail-fast: false on the matrix is the right call. Stopping at the first failing Python version would hide whether the failure is version-specific; seeing all three outcomes at once is much more useful for diagnosis.
  • continue-on-error is appropriately scoped and documented. The inline comment ("continue-on-error until the full suite is macOS-clean") explains the temporary nature, and CLAUDE.md's guidance on job-level tolerance aligns with this use case — it's known experimental failure, not silent degradation.

Verdict

Approve with suggestion — one missing timeout-minutes is the only thing worth fixing; everything else is correct. Apply the suggestion above and this is ready to merge.

_is_corrupt_download_error deliberately excludes 'llama-server failed
to start' — it fires for too many non-corruption failures. The test
was asserting True but the implementation correctly returns False.
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Jun 1, 2026

Clean CI hardening PR with one substantive test bug fix and three well-scoped workflow improvements. No blocking issues.


Summary

The test inversion in test_llamacpp_backend.py is the most important change here: the old assertion was wrong — _is_corrupt_download_error never matched "llama-server failed to start" (it's absent from _CORRUPT_DOWNLOAD_PHRASES), so the test was silently verifying the opposite of the documented safety invariant. The rest of the PR delivers real CI cost/coverage value: a Python 3.10–3.12 matrix, macOS smoke lane, Lemonade path filters, and cli.py in the MCP trigger set. All four changes are targeted and scope-clean.


Issues Found

🟢 Minor — Stale class docstring (tests/unit/test_llamacpp_backend.py:588)

The class-level docstring reads "Verify _is_corrupt_download_error catches llama-server startup failures." — but after the fix the class verifies the opposite: that startup failures are not treated as corruption. A reader scanning the class without reading each test will form the wrong mental model.

    """Verify _is_corrupt_download_error correctly classifies corruption signals."""

🟢 Minor — pytest-timeout installed but not applied on the Linux matrix (test_unit.yml)

pytest-timeout is added to the Linux uv pip install step (line 70) and used with --timeout=60 in the macOS smoke job (line 196). The main Linux unit-test run (line 92) doesn't pass --timeout, so runaway tests will still block the 30-minute job rather than fail loudly at the test level. This is not urgent — the job-level timeout-minutes: 30 provides a backstop — but aligning the two jobs makes the per-test limit actionable from CI logs.

          pytest tests/unit/ -v --tb=short --timeout=60 \
            --cov=src/gaia --cov-report=term-missing --cov-report=xml:coverage.xml

🟢 Observation — continue-on-error: true at job level vs. step level

CLAUDE.md's note on continue-on-error describes step-level use for known non-fatal permission warnings. The macOS job applies it at the job level, which is a broader scope: a silent regression in the macOS lane will show yellow rather than red and could go unnoticed. The PR description calls this out explicitly ("until the full suite is macOS-clean"), so the intent is clear and the approach is reasonable for an experimental lane. Worth tracking as a ticket to remove continue-on-error once the macOS suite is stable.


Strengths

  • Test fix is correct and carries the right docstring. The method-level docstring on test_llama_server_failed_to_start_is_not_corruption explains exactly why the phrase was removed from the corruption set (resource limits, ctx_size, port conflicts → destructive delete path). That's the kind of comment CLAUDE.md asks for.
  • fail-fast: false on the version matrix is the right call. Letting all three Python versions report results independently is more useful than short-circuiting on the first failure — especially since 3.10/3.11 failures may differ from 3.12.
  • Path filters on test_lemonade_server.yml are comprehensive. Covering src/gaia/llm/**, installer/**, .github/actions/install-lemonade/**, tests/test_lemonade*.py, and setup.py will meaningfully reduce the expensive AMD hardware job's trigger rate. Adding src/gaia/cli.py to the MCP filter is a subtle but correct catch — MCP subcommands are wired through cli.py.

Verdict

Approve with suggestions. The stale class docstring is the only change I'd ask for; the --timeout alignment is a nice-to-have. Nothing blocking.

@kovtcharov-amd kovtcharov-amd enabled auto-merge June 1, 2026 17:11
@kovtcharov-amd kovtcharov-amd disabled auto-merge June 1, 2026 17:11
@kovtcharov-amd kovtcharov-amd merged commit a12c72a into main Jun 1, 2026
32 of 33 checks passed
@kovtcharov-amd kovtcharov-amd deleted the feat/ci-matrix-881 branch June 1, 2026 17:11
@kovtcharov-amd kovtcharov-amd mentioned this pull request Jun 1, 2026
6 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

devops DevOps/infrastructure changes p2 low priority security Security-sensitive changes tests Test changes

Projects

None yet

Development

Successfully merging this pull request may close these issues.

CI: macOS lane + Python version matrix + path-filter audit

3 participants